Multilingual Dependency-based Syntactic and Semantic Parsing

نویسندگان

Wanxiang Che

Zhenghua Li

Yongqiang Li

Yuhang Guo

Bing Qin

Ting Liu

چکیده

Our CoNLL 2009 Shared Task system includes three cascaded components: syntactic parsing, predicate classification, and semantic role labeling. A pseudo-projective high-order graph-based model is used in our syntactic dependency parser. A support vector machine (SVM) model is used to classify predicate senses. Semantic role labeling is achieved using maximum entropy (MaxEnt) model based semantic role classification and integer linear programming (ILP) based post inference. Finally, we win the first place in the joint task, including both the closed and open challenges. 1 System Architecture Our CoNLL 2009 Shared Task (Hajič et al., 2009): multilingual syntactic and semantic dependencies system includes three cascaded components: syntactic parsing, predicate classification, and semantic role labeling. 2 Syntactic Dependency Parsing We extend our CoNLL 2008 graph-based model (Che et al., 2008) in four ways: 1. We use bigram features to choose multiple possible syntactic labels for one arc, and decide the optimal label during decoding. 2. We extend the model with sibling features (McDonald, 2006). 3. We extend the model with grandchildren features. Rather than only using the left-most and rightmost grandchildren as Carreras (2007) and Johansson and Nugues (2008) did, we use all left and right grandchildren in our model. 4. We adopt the pseudo-projective approach introduced in (Nivre and Nilsson, 2005) to handle the non-projective languages including Czech, German and English. 2.1 Syntactic Label Determining The model of (Che et al., 2008) decided one label for each arc before decoding according to unigram features, which caused lower labeled attachment score (LAS). On the other hand, keeping all possible labels for each arc made the decoding inefficient. Therefore, in the system of this year, we adopt approximate techniques to compromise, as shown in the following formulas. f lbl uni(h, c, l) = f lbl 1 (h, 1, d, l) ∪ f lbl 1 (c, 0, d, l) L1(h, c) = arg max1 l∈L(w · f lbl uni(h, c, l)) f lbl bi (h, c, l) = f lbl 2 (h, c, l) L2(h, c) = arg max2 l∈L1(h,c)(w · {f lbl uni ∪ f lbl bi }) For each arc, we firstly use unigram features to choose the K1-best labels. The second parameter of f lbl 1 (·) indicates whether the node is the head of the arc, and the third parameter indicates the direction. L denotes the whole label set. Then we re-rank the labels by combining the bigram features, and choose K2-best labels. During decoding, we only use the K2 labels chosen for each arc (K2 ¿ K1 < |L|). 2.2 High-order Model and Algorithm Following the Eisner (2000) algorithm, we use spans as the basic unit. A span is defined as a substring of the input sentence whose sub-tree is already produced. Only the start or end words of a span can link with other spans. In this way, the algorithm parses the left and the right dependence of a word independently, and combines them in the later stage. We follow McDonald (2006)’s implementation of first-order Eisner parsing algorithm by modifying its scoring method to incorporate high-order features. Our extended algorithm is shown in Algorithm 1. There are four different span-combining operations. Here we explain two of them that correspond to right-arc (s < t), as shown in Figure 1 and 2. We

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...

متن کامل

Feature Engineering in Persian Dependency Parser

Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts structure of sentences and determines the relations between words based on the dependency grammar. The dependency parser is proper for free order languages, such as Persian. In this paper, data-driven dependency parser has been developed with the help of phrase-structure parser fo...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

Multilingual Syntactic-Semantic Dependency Parsing with Three-Stage Approximate Max-Margin Linear Models

This paper describes a system for syntacticsemantic dependency parsing for multiple languages. The system consists of three parts: a state-of-the-art higher-order projective dependency parser for syntactic dependency parsing, a predicate classifier, and an argument classifier for semantic dependency parsing. For semantic dependency parsing, we explore use of global features. All components are ...

متن کامل

Multilingual Dependency Learning: Exploiting Rich Features for Tagging Syntactic and Semantic Dependencies

This paper describes our system about multilingual syntactic and semantic dependency parsing for our participation in the joint task of CoNLL-2009 shared tasks. Our system uses rich features and incorporates various integration technologies. The system is evaluated on in-domain and out-of-domain evaluation data of closed challenge of joint task. For in-domain evaluation, our system ranks the se...

متن کامل

Multilingual Joint Parsing of Syntactic and Semantic Dependencies with a Latent Variable Model

Current investigations in data-driven models of parsing have shifted from purely syntactic analysis to richer semantic representations, showing that the successful recovery of the meaning of text requires structured analyses of both its grammar and its semantics. In this article, we report on a joint generative history-based model to predict the most likely derivation of a dependency parser for...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

Multilingual Dependency-based Syntactic and Semantic Parsing

نویسندگان

چکیده

منابع مشابه

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Feature Engineering in Persian Dependency Parser

An improved joint model: POS tagging and dependency parsing

Multilingual Syntactic-Semantic Dependency Parsing with Three-Stage Approximate Max-Margin Linear Models

Multilingual Dependency Learning: Exploiting Rich Features for Tagging Syntactic and Semantic Dependencies

Multilingual Joint Parsing of Syntactic and Semantic Dependencies with a Latent Variable Model

عنوان ژورنال:

اشتراک گذاری